Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis

نویسندگان

  • Tamás Gábor Csapó
  • Géza Németh
  • Milos Cernak
چکیده

In statistical parametric speech synthesis, creaky voice can cause disturbing artifacts. The reason is that standard pitch tracking algorithms tend to erroneously measure F0 in regions of creaky voice. This pattern is learned during training of hidden Markov-models (HMMs). In the synthesis phase, false voiced / unvoiced decision caused by creaky voice results in audible quality degradation. In order to eliminate this phenomena, we use a simple continuous F0 tracker which does not apply a strict voiced / unvoiced decision. In the proposed residual-based vocoder, Maximum Voiced Frequency is used for mixed voiced and unvoiced excitation. As all parameters of the vocoder are continuous, Multi-Space Distribution is not necessary during training the HMMs, which has been shown to be advantageous. Artifacts caused by creaky voice are eliminated with this speech synthesis system. A subjective listening test of English utterances has shown improvement over the traditional excitation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discontinuous Observation HMM for Prosodic-Event-Based F0 Generation

This paper examines F0 modeling and generation techniques for spontaneous speech synthesis. In the previous study, we proposed a prosodic-unit HMM where the synthesis unit is defined as a segment between two prosodic events represented by a ToBI label framework. To take the advantage of the prosodicunit HMM, continuous F0 sequences must be modeled from discontinuous F0 data including unvoiced r...

متن کامل

Parameterization of vocal fry in HMM-based speech synthesis

HMM-based speech synthesis offers a way to generate speech with different voice qualities. However, sometimes databases contain certain inherent voice qualities that need to be parametrized properly. One example of this is vocal fry typically occurring at the end of utterances. A popular mixed excitation vocoder for HMM-based speech synthesis is STRAIGHT. The standard STRAIGHT is optimized for ...

متن کامل

An excitation model for HMM-based speech synthesis based on residual modeling

This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based speech synthesizers. During the waveform generation part, mixed excitation is constructed by state-dependent filtering of pulse trains and white noise sequences. In the training part, filters and pulse trains are jointly optimized through a procedure which resembles analysis-bysynthesis speech codin...

متن کامل

A hierarchical F0 modeling method for HMM-based speech synthesis

The conventional state-based F0 modeling in HMM-based speech synthesis system is good at capturing micro prosodic features, but difficult to characterize long term pitch patterns directly. This paper presents a hierarchical F0 modeling method to address this issue. In this method, different F0 models are used to model the pitch patterns for different prosodic layers (including state, phone, syl...

متن کامل

Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis

This paper describes an excitation model based on amplitude spectrum for hidden Markov model (HMM)-based speech synthesis system (HTS). Residual signal obtained from inverse filtering is decomposed into periodic and aperiodic spectrums in frequency domain. Amplitude spectrum of half pitch period length is reserved as periodic component in synthesis stage and zero-phase criterion and pitch synch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015